Company Mention Detection for Large Scale Text Mining
نویسندگان
چکیده
Text mining on a large scale that addresses actionable prediction needs to content with noisy information in documents, and with interdependencies between the kinds of NLP techniques applied and the data representation of instances. This paper presents an initial investigation of the impact of improved company mention detection for financial analytics. Coverage of company mention detection improve dramatically. Improvement for prediction of stock price varies, depending on the data representation.
منابع مشابه
An open-source framework for large-scale, flexible evaluation of biomedical text mining systems
BACKGROUND Improved evaluation methodologies have been identified as a necessary prerequisite to the improvement of text mining theory and practice. This paper presents a publicly available framework that facilitates thorough, structured, and large-scale evaluations of text mining technologies. The extensibility of this framework and its ability to uncover system-wide characteristics by analyzi...
متن کاملWhat did you Mention? A Large Scale Mention Detection Benchmark for Spoken and Written Text
We describe a large, high-quality benchmark for the evaluation of Mention Detection tools. The benchmark contains annotations of both named entities as well as other types of entities, annotated on different types of text, ranging from clean text taken from Wikipedia, to noisy spoken data. The benchmark was built through a highly controlled crowd sourcing process to ensure its quality. We descr...
متن کاملExperimenting with Anomaly Detection by Mining Large-scale Information Networks
Social networks have formed the basis of many studies into large networks analysis. Whilst much is already known regarding efficient algorithms for large networks analysis, data mining, knowledge diffusion, anomaly detection, viral marketing, to mention. More recent research is focussing on new classes of efficient approximate algorithms that can scale to billion nodes and edges. To this end, t...
متن کاملComparison of Structured vs. Unstructured Data for Industrial Quality Analysis
Industrial methods for quality analysis massively rely on structured data describing product features and product usage. The analysis of such data is normally done using complex reporting or sophisticated data mining methods. Besides this structured data, companies very often also posses large amounts of unstructured text like call center reports, internet fora or repair order documents. Despit...
متن کاملAutomatic Discovery of Technology Networks for Industrial-Scale R&D IT Projects via Data Mining
Industrial-Scale R&D IT Projects depend on many sub-technologies which need to be understood and have their risks analysed before the project can begin for their success. When planning such an industrial-scale project, the list of technologies and the associations of these technologies with each other is often complex and form a network. Discovery of this network of technologies is time consumi...
متن کامل